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Abstract. We prove tail estimates for variables of the form ^ f(Xi), where (Xi)i 
is a sequence of states drawn from a reversible Markov chain, or, equivalcntly, from 
a random walk on an undirected graph. The estimates are in terms of the range 
of the function /, its variance, and the spectrum of the graph. The purpose of our 
estimates is to determine the number of chain/walk samples which are required for 
approximating the expectation of a distribution on vertices of a graph, especially 
an expander. The estimates must therefore provide information for fixed number of 
samples (as in Gillman's [4]) rather than just asymptotic information. Our proofs 
are more elementary than other proofs in the literature, and our results are sharper. 
We obtain Bernstein and Bennett- type inequalities, as well as an inequality for 
subgaussian variables. 

1. Introduction 

One of the basic concerns of sampling theory is economising on the 'cost' and quan- 
tity of samples required to estimate the expectation of random variables. Drawing 
states by implementing a reversible Markov chain or, equivalently, by conducting a 
random walk is often considerably 'cheaper' than the standard Monte-Carlo procedure 
of drawing independent random states. Independence is indeed lost when sampling by 
a Markov chain; the empirical average, however, may converge to the actual average 
at a comparable rate to the rate of convergence for independent sampling. This form 
of sampling is especially useful in the context of random walks on expander graphs. 

This approach plays an important role in statistical physics and in computer science 
(a concise summary of applications is provided in [5]). Results concerning the rate of 
convergence of empirical averages sampled by a random walk, which hold for a fixed 
number of samples (rather than just asymptotically), have been obtained by several 
authors starting with Gillman's [I], followed by [3], [7J and [8] (for vector valued 
functions consult Of these, only [7J and [B] allowed the variance to play a role in 
their estimates, as is the case in this paper. 

This paper is a further step in this direction. We improve known Bernstein-type 
inequalities, and prove a new Bennett-type inequality and a new inequality for sub- 
gaussian variables. Our methods are much more elementary than the ones prevailing 
in the literature, as we do not apply Kato's perturbation theory to estimate eigenval- 
ues. 
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Our results were motivated by applications relating to graphs with large spectral 
gaps (expanders) and tails which go far beyond the variance (large deviations), such as 
the recent pQ. Accordingly, our results are stated for the reversible discrete setting. 
Analogues for the continuous and non-reversible settings can be derived using the 
simple reduction techniques presented in sections 3.2 and 3.3 of [7]. 



Let G be a finite undirected, possibly weighted, connected graph with N vertices 
(random walks on such graphs can represent any finite irreducible reversible Markov 
chain). Denote by s the stationary distribution of the random walk on the graph or 
the equivalent Markov chain. Let / be a function on the vertices of G, normalised to 
have absolute maximum 1 and mean relative to the stationary distribution, namely 
^2if{i)s(i) = 0. Let V = J2i / 2 W S W denote the variance of / with respect to the 
stationary distribution. We will think of functions on G as vectors in M. N and vice 
versa, so where u and v are vectors, expressions such as e" and uv will stand for 
coordinatewise operations. 

Denote by P the transition matrix of the Markov chain/random walk, such that 
Pij is the probability of moving from node/state j to node/state i. By the Perron- 
Frobenius theorem the eigenvalues of this matrix are all real, the top eigenvalue is 1 
(with s as the only corresponding eigenvector up to scalar multiplication), and the 
absolute value of all other eigenvalues is smaller or equal to 1. Let a < 1 be the 
maximum between the second largest eigenvalue of P and zero, and ft < 1 the second 
largest absolute value of an eigenvalue of P. 

Given a starting distribution q, the random variables X Q , Xi, . . . will denote the 
trajectory of the random walk or, equivalently, the states drawn from the Markov 
chain. F q and K q will stand for the probability and expectation of events related 
to this walk respectively. Let S n = X^Li/PQ- O ur concern in this paper is tail 
estimates for the distribution of S n . 

We will prove inequalities in terms of both a and ft. Note that inequalities in 
terms of a 'cost' an additional multiplicative factor outside the exponent, whereas 
inequalities in terms of ft are useless in the case of ft = 1 (i.e. bipartite graphs), and 
become relatively poor if a is small and ft is large, which may be the case. 

Theorem 1. Define 
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In the above setting we get 
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Remark. Note that the results are the same up to the factor | in the exponent, the 
multiplicative factor e 2r and the replacement of (5 2 by a. 

Note also that when a goes to 0, which is effectively almost the case of independent 
sampling, A(a,r) also vanishes, and the term we get inside the exponent is the same 
as the term appearing in standard proofs of Bennett and Bernstein inequalities for 
independent variables. 



The infimum is hard to compute, so we must optimise separately for different 
parameter regimes. 

First we use the above result to derive a Bennett-type inequality (cf. [2]). 

Corollary 2. In the above setting, 
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where t>l,C a = 2f^ and provided that 7 < (t - 1)^V. 

Our theorem also allows us to reproduce Lezaud's estimates from [7] with improved 
constants: 



Corollary 3. 



and 



¥ q (-S n > 7) < 
n 



1 l-a I 2 

f q (-S n > 7) < e!+» v+j 
n 



Remark. These inequalities imply 

1 



-S n > 7) < ew (1 " ah2 



n 



g "l+a2(V+ 7 ) 



4 



ROY WAGNER 



for 7 < V, and 

K(-S n > 7) < ei (1 " ah 4= e ~ S (1 ~° h 
n V s 2 

/or 7 > V . If is much larger or much smaller than V, the constant 8 can be 

decreased towards 4. Our method allows to improve the constant multiplying 7 in the 

denominator, but we will not include the details because the modification to the proof 

is cumbersome and straightforward. 

The Bennett-type bound improves upon this Bernstein-type result for 7 >> V, 
provided (3 is small enough. This allows to see how a smaller (3 reduces the number 
of required samples. 



Finally, our technique can be adapted to situations where we have additional in- 
formation on the distribution of /, such as subgaussian tails. Let s denote here, by 
abuse of notation, the measure on the vertices of the graph which corresponds to the 
stationary distribution. 

Theorem 4. In the above setting assume also that s(f > t) < Ce~ Kf2 for positive t, 
and remove the assumption \ f\ < 1. Then 

P q (-S n > 7) < 
n 

as long as 7 < log + %)/2K\\f\\ 
We also have 

F q (-S n > 7) < 
n 

as long as 7 < log + ±)/2K\\f\\ 00 . 

For some parameter regimes Theorem H] asymptotically improves upon Theorem 12 
from pp. 





3. Proofs of results in terms of (3 

In this section we will prove the inequalities involving (3. Sketches of proofs for 
inequalities involving a are deferred to the next section. Before we begin proving we 
introduce some notation. We will denote ||w||i/ s = J2i the ^-weighted £2 norm 

on M. N . The inner product associated with this norm is (u, v) = £\ ■ When we 
refer to the standard £2 norm we will use the notation || • || 2 . 

The transition matrix P is not necessarily symmetric, and so its eigenvectors need 
not be orthogonal (this would be the case only if G were a regular graph). Reversibil- 
ity, however, promises that SjPij = SiPji, and so P is self adjoint and its eigenvectors 
are mutually orthogonal with respect to the ^-weighted Euclidean structure. There- 
fore the || • ||i/ s norm of P restricted to the subspace orthogonal to s is {3, the second 
largest absolute value of the eigenvalues of P. 
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Proof of Theorem 1. The beginning of our proof is identical to that of Gillman's and 
of those which follow its reasoning. Take r > 0. By Markov's inequality 

n(-5n>7)<e"™ 7 E 9 e r ' 5 ", 
where the expectation can be directly expressed and estimated as 

E U Sn q(*o) ]lV T k* i+1 ) = <*, (e rf P) n q) 

(x ,...,x n )eG n + 1 \ i=0 J 

< \\n\\, , II Pp r f\\ n 
^ l|y||l/s||-r e 1 1 l/s - 

Here e r f stands for the diagonal matrix with e r ^^ as diagonal entries, and the inner 
product is, we recall, the inner product associated with the --weighted £2 norm. 

At this point Gillman's proof and its variations symmetrise the operator so that 
its norm will equal its top eigenvalue, and use Kato's spectral perturbation theory 
to estimate this eigenvalue. Our proof, on the other hand, will proceed to simply 
estimate the norm directly. To do that we will use the equality 

\\Pe rf \\ 2 1/s = max (Pe rf u, Pe rf u). 
IMIi/ s =i 

In order to perform the computation we split the vector u into stationary and or- 
thogonal components, u = as + bp, where p is normalised and orthogonal to s in the 
weighted Euclidean structure. Applying similar decompositions e r ^s = xs + za and 
e r f p = ys + wt we get 



\\Pe rf \\l 

max (a(xs + zPa) + b(ys + wPr), a(xs + zPa) + b(ys + wPr)) . 

a 2 +6 2 =l, p, a, r 

We open the inner product and obtain 
\\Pe r % s = 

max a 2 (x 2 + z 2 \\Pa\\ 2 ) + b 2 (y 2 + w 2 \\Pt\\ 2 ) + 2ab(xy + zw(Pa, Pr)). 

a 2 +b 2 =l, p,ct,t 

Denote p a = ||P<t|| 2 , p T = \\Pt\\ 2 and p a ^ = (Po~,Pt). Our task is reduced to 
computing the £2 norm of the following 2 by 2 symmetric bilinear form: 

x 2 + z 2 p a xy + zwp atT 
xy + zwp a T y 2 + w 2 p T 
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Applying standard computations to derive the norm we get 
1 
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( 2 i 2 i 2 , 2 \ i 

[x + y + z p a + w p T ) + 
[(x 2 + y 2 - z 2 p a - w 2 p T ) 2 + Az 2 w 2 {p 2 a T - p a p T ) 
+ Ax 2 z 2 p a + Ay 2 w 2 p T + 8xyzwpa- jT \ ^ 2 



1 

< - 
~ 2 



(x 2 + y 2 + z 2 p a + w 2 p T ) + 



[(x 2 + y 2 - z 2 p a - w 2 p T ) 2 + A(\xzy^\ + \ywy/p^\) 2 ] 



1/2 



where we used the Cauchy-Schwarz inequality p 2 UT < p a p T - 



To estimate the square root we use the inequality \/l + X 2 < 1 + This lead us 



to 
(3) 



\Pe 1 1/s < x +y + — 



Note that this result depends on assuming that x 2 + y 2 > z 2 p a + w 2 p T . (For the 
purposes of the proof of Theorem H] we require the inequality 



(4) 



\Pe rf \\i/ s < x 2 + y 2 + \xz^p~ a \ + \yw^/p~ T 



which is obtained by using Vi + AT 2 < 1 + |X|, and depends on the same inequality.^ 



Let us now estimate the components of our formula. We recall that / has mean 
with respect to the stationary distribution s and absolute maximum 1. We obtain 

\f 2 s, s)r 2 (f 

x 



\e s,s)- i+ xj + 2 , + 3! +.. 



2 S 4 

< l + V(- + - + - + ...)<l + ne r -l-r) 



Note also that |/| < 1 implies that x < e r , and that by the arithmeitc-geometric 
mean x = £\ s(z)e r/(i) > e r = 1. 

To estimate y = (e r ^ p, s) = (e r f s, p) recall that p is normalised and orthogonal to 
s, and that (fs,p) < ||/s||i/ s = \JV . We get 



V\ = \(e rf s,p)\ = {s,p) + 



(fs,p)r (f 2 s,p)r 2 



1! 

2 1 



2! 



+ . . . 



< W(r + - + - + ...)<Vv(e r -l) 

Note that x 2 + y 2 < \\e r ^s\\y s = (e 2r ^s, s), which, as in the computation of x above, 
is bounded by 1 + V(e 2r — 1 — 2r). 

Next, using the same estimate as for y, we get \z\ = \ (e r ^s, a) \ < W(e r — 1). For 
w = (e r * p, r) we use the estimate \w\ < e r , which also applies to x. 



TAIL ESTIMATES FOR SUMS OF VARIABLES SAMPLED BY A RANDOM WALK 



Finally, since the norm of P restricted to the subspace orthogonal to s is j3, we 
have p a ,Pr,Pa,r < P 2 

Now we plug our estimates into inequality ([3]), and derive 

(2e r VV(e r - l)py 



Pe rf \\l /S < 1 + V(t 



2r 



l + V[e 



,2r 



2r) 



2r 



< exp I V I e 2r - 1 - 2r + 



1 - (3 2 e 2r - (3 2 V(e r - l) 2 

A(3 2 e 2r (e r - l) 2 
1- p 2 (e 2r + V(e r -l) 2 ) 
4/3 2 e 2r (e r - l) 2 



1 - (3 2 (e 2r + ^(e r - I) 2 } 
as long as 1 > /3 2 (e 2r + V(e r — l) 2 ). To conclude, recall that 

1 



' n 



-S n > 1 )<e- n ^\\q\\ 1/s \\Pe rf \\ n 1 



l/si 



so we finally obtain 
1 



P g (-5 n > 7) < min 

n l>5(/3 2 ,r), r>0 



-n[7r-iy(e 2r -l-2r+A(/3 2 ,r))] 



□ 



To derive the corollaries and Theorem HJ we only need to assign suitable values to 
r. We will restrict to the case q = s in order not to have to carry the H^jlh term. 

Proof of Corollary^ Using the inequalities [e r — l) 2 < e 2r — 1 — 2r and e 2r + V(e r — 
l) 2 < 2e 2r — 1 we bound the expression inside the exponent in inequality (pQ) by 

4(3 2 e 2r 



n 
2 
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2 7 r - V(e 2r - 1 - 2r) f H 
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Proof of Corollary^ First, we will apply the inequalities e 2r — 1— 2r < 2r 2 e 2r , e r — 1 < 
re r and e 2r + V(e r — l) 2 < 2e 2r — 1 to the exponent in inequality ([T]). The exponent 
then turns into 



n 
2 



2 7r _ ^ r 2 e 2r I 2 + 



1 - (3 2 (2e 



2r 



-n 



< —n 



ryj. _ + 



— Vr z 



1 + P 2 



[1 + /3 2 )e~ 2r - 2/? 2 

1+/? 2 
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Now we set r = j^jp 2F7+V) anc ^ obtain the desired result. Note that using more careful 
estimates can lead to a sharper constant multiplying 7 in the denominator. □ 



Proof of Theorem 4- In this proof we will not assume that |/| < 1. For the purpose 
of this proof we offer a different analysis of the bound x 2 + y 2 < ^ i e 2r ^ % ^s{i). This 
is simply the expectation of e 2r ^ according to the measure s. We can now evaluate 
this quantity using the subgaussian information. We get 



x 2 + y 2 < I e 2rt d(-s(f>t)) 

-00 

f 00 

< 1 



2re 2rt s(f > t)dt 







2re Zrt Ce~ Kt alt = 1 + C \l —re 



r 2 /K 



Plugging this estimate into inequality (jlj) together with the simple estimates x, w < 



and y, z < e r "-' "°° — 1 we obtain 



\Pe rf \\l /s <l + Cj-re 



r 2 /K 



+ 2(3{e 



2r 



As noted, inequality (jlj) depends on taking x 2 + y 2 > z 2 /3 2 + w 2 /3 2 , which is guaranteed 
as long as ( 5 2 (2e 2r '^"°° — 1) < 1. We will make the stronger assumption /3(2e 2r '^"°° — 
1) < 1, and obtain the bound 



\Pe rf \\ 2 1/s <(C^r + 2)< 



,r 2 /K 



Recalling that 



n(-Sn>7)<^ n7r |M|i/ s ||Pe^||? 



n 



and setting r = jK, we conclude the required 



P g (^n > 7) < \\q\\i/ s e- nl ^(CV^K-f + 2 



sn/2 



e 2 



The condition /3(2e 2r|l/i|o ° - 1) < 1 now reduces to 7 < log (± + \)/2K 



□ 



Remark. Note that our method allows to increase 7 as far as log (7^2 + ^)/2-K"||/||ooj 
where our estimate becomes trivial. 
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4. Proofs of results in terms of a 



The differences between the proofs of results in terms of (3 and a are mostly com- 
putational, so I will only sketch the relevant differences. 

Proof of Theorem 1. As above, our task is to estimate ||(Pe r ^) n ||i/ s . We will use the 
simple identity Pe r f = e~^ r f e^ r f Pe^ r f e^ r f to obtain 

||(e r/ P)1i/ s < e r \\(e^Pe^) n \\ 1/s < e r \\e* r fPe* r % 8 . 

Since the operator e^ T 'f Pe^ r f is self-adjoint with respect to the weighted Euclidean 
structure, we have 

\\ez r f Pez r f\\i/ S = max (e^Pe^w, u) = max (Pe^ r ^u, e^ r ^u). 
IM!i/ s =1 IMIi/ s =1 

Decomposing the vectors as in the /3-case (with |r replacing r) we get 

\\e* rf Pe* r t\\y a = 

max (a(xs + zPa) + b(ys + wPr), a(xs + zo) + b(ys + wr)) . 

a 2 +b 2 =l, p,a,T 

We open the inner product and obtain 

\\e^Pe^\\ 1/s = 

max a 2 (x 2 + z 2 (Pa, a)) + b 2 (y 2 + w 2 (Pr, r}) + 2ab(xy + zw(Pa, r)). 

a 2 +6 2 =l, p, a, t 

Our task is reduced to computing the £2 norm of the same 2 by 2 symmetric bilinear 
form as in the /3-case, except that r is replaced by =r, and the definitions of the p's 
are now p a = (Per, a), p T = (Pr, r) and = (Pa, r). 

The following identity still holds: 



\e^Pe^\\ys = ~ 



[X +2/ + Z P<r + W p T ) + 



\l 2 1 2 2 2 \2,/i2 2/2 \ 

[(X + J/ - Z p a ~ W Pr) +AZ W [p a T - p a pr) 

+ Ax 2 z 2 p a + Ay 2 w 2 p T + Sxyzwp ajT \ ^ 2 

This time, however, the treatment of the terms inside the square root is slightly more 
delicate. Let Aj be the eigenvalues of P in descending order, and let (a l )i and (r l )j 
be the coordinates of a and r respectively in terms of the associated orthonormal 
system. Define 

Pi= E A *^) 2 and p; = -E A ^) 2 ' 

l>Ai>0 Ai<0 



and decompose p r and p a>T analogously. By Cauchy-Schwarz \p^ T \ < ypfpf, and 
the same goes for the p~'s. 
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All this yields 

Pl,r ~ VvVt = {P%r ~ Pa,r) 2 - (Pi - Pa){pt ~ Pr) 

= (Ojr) 2 - Ptpt ) + {{Pa,rf ~ PaPr ) + ipt Pr + P'aPt ~ 2 Pa,rPt,r 
< (VpJpT + VPvPtY 

and 

x 2 z 2 p a +y 2 w 2 p T + 2xyzwpa- tT 

= (x 2 z 2 p^ + y 2 w 2 p^ + 2xyzwp^ T ) — [x 2 z 2 p~ + y 2 w 2 p~ + 2xyzwp~ T ) 
<(\xz^/p+\ + \yw^/p+\) 2 . 
We now combine the two estimates to get 
Az 2 w 2 (p 2 T — p a Pr) + Ax 2 z 2 p a + Ay 2 w 2 p T + 8xyzwp a:T 

< 4max(|x2;| 2 , |yu;| 2 , \zw\ 2 ) ({y/p+p- + \fp~P~tf + (\/p~t ' + \/p~t) 2 

Since A 2 = a, all p +, s are bounded by a. Note also that p+ + cup" < a||er|| 2 , s = a, 
and the same goes for r. So, in fact, the above is bounded by the expression 

4max(|x2| 2 , \yw\ 2 , \zw\ 2 ) (j^\/p+(l -pt/a) + VpH 1 ~Pt/ a )) + (VpT + \fp~t) 

Rearranging terms and using Cauchy-Schwarz we get 

2 / , , \ 2 



VpJO- - pJM + VpJT 1 - pJh) +(Vri + Vpt) 

< 2 (p+(l - pt /a) +pt + ^(pi+p+il-pi/a)) (p++p+(l -p+/a))) < 4a. 
So we finally obtain 



2 



/ 2 i 2 i 2 i 2 

(x +y + z p a + w p T 



[(x 2 + y 2 — z 2 p a — w 2 p T ) 2 + 16amax (|x2| 2 , \yw\ 2 , \zw\ 2 )] 



2MV2 



Using the same estimates as in the /5-case, replacing r by hr in the estimates of 
x, y, z and w, recalling that p a iP T < oc, and finally changing the bound variable r into 
2r we obtain the desired results. □ 

The other proofs derive from the remark following Theorem Q], which applies to the 
proof of Theorem [4] as well. 
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